All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

نویسندگان

  • Mitesh M. Khapra
  • Anup Kulkarni
  • Saurabh Sohoney
  • Pushpak Bhattacharyya
چکیده

In spite of decades of research on word sense disambiguation (WSD), all-words general purpose WSD has remained a distant goal. Many supervised WSD systems have been built, but the effort of creating the training corpus annotated sense marked corpora has always been a matter of concern. Therefore, attempts have been made to develop unsupervised and knowledge based techniques for WSD which do not need sense marked corpora. However such approaches have not proved effective, since they typically do not better Wordnet first sense baseline accuracy. Our research reported here proposes to stick to the supervised approach, but with far less demand on annotation. We show that if we have ANY sense marked corpora, be it from mixed domain or a specific domain, a small amount of annotation in ANY other domain can deliver the goods almost as if exhaustive sense marking were available in that domain. We have tested our approach across Tourism and Health domain corpora, using also the well known mixed domain SemCor corpus. Accuracy figures close to self domain training lend credence to the viability of our approach. Our contribution thus lies in finding a convenient middle ground between pure supervised and pure unsupervised WSD. Finally, our approach is not restricted to any specific set of target words, a departure from a commonly observed practice in domain specific WSD.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TreeMatch: A Fully Unsupervised WSD System Using Dependency Knowledge on a Specific Domain

Word sense disambiguation (WSD) is one of the main challenges in Computational Linguistics. TreeMatch is a WSD system originally developed using data from SemEval 2007 Task 7 (Coarse-grained English Allwords Task) that has been adapted for use in SemEval 2010 Task 17 (All-words Word Sense Disambiguation on a Specific Domain). The system is based on a fully unsupervised method using dependency k...

متن کامل

HR-WSD: System Description for All-Words Word Sense Disambiguation on a Specific Domain at SemEval-2010

The document describes the knowledgebased Domain-WSD system using heuristic rules (knowledge-base). This HRWSD system delivered the best performance (55.9%) among all Chinese systems in SemEval-2010 Task 17: All-words WSD on a specific domain.

متن کامل

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD

This paper explores the application of knowledgebased Word Sense Disambiguation systems to specific domains, based on our state-of-the-art graphbased WSD system that uses the information in WordNet. Evaluation was performed over a publicly available domain-specific dataset of 41 words related to Sports and Finance, comprising examples drawn from three corpora: one balanced corpus (BNC), and two...

متن کامل

Knowledge-Based WSD and Specific Domains: Performing Better than Generic Supervised WSD

This paper explores the application of knowledgebased Word Sense Disambiguation systems to specific domains, based on our state-of-the-art graphbased WSD system that uses the information in WordNet. Evaluation was performed over a publicly available domain-specific dataset of 41 words related to Sports and Finance, comprising examples drawn from three corpora: one balanced corpus (BNC), and two...

متن کامل

Bootstrapping Without the Boot

What: We like minimally supervised learning (bootstrapping). Let’s convert it to unsupervised learning (“strapping”). How: If the supervision is so minimal, let’s just guess it! Lots of guesses lots of classifiers. Try to predict which one looks plausible (!?!). We can learn to make such predictions. Results (on WSD): Performance actually goes up! (Unsupervised WSD for translational senses, Eng...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010